A set of probabilistic predictions is well calibrated if the events that arepredicted to occur with probability p do in fact occur about p fraction of thetime. Well calibrated predictions are particularly important when machinelearning models are used in decision analysis. This paper presents two newnon-parametric methods for calibrating outputs of binary classification models:a method based on the Bayes optimal selection and a method based on theBayesian model averaging. The advantage of these methods is that they areindependent of the algorithm used to learn a predictive model, and they can beapplied in a post-processing step, after the model is learned. This makes themapplicable to a wide variety of machine learning models and methods. Thesecalibration methods, as well as other methods, are tested on a variety ofdatasets in terms of both discrimination and calibration performance. Theresults show the methods either outperform or are comparable in performance tothe state-of-the-art calibration methods.
展开▼